Geowhatsup

Geospatial

There are several key libraries/object types for handling geospatial data in R.

  • sf: This is a library for handling simple feature objects. Simple features are a standard data protocol for geometry. Libraries may assume you already have an sf object, so if you’re reading data in from a csv, you may need to use sf to do conversion, further processing (spatial joins, etc). Read more here

  • tigris: This is an interface to the census bureau’s TIGER shapefiles. Manually keeping and loading shapefiles in a pain in the butt. These download as sf objects, which is nice so you don’t have to do too much conversion.

The rest is kind of a hodgepodge of packages for helping you deal with the data. There are really so many different packages meant for mapping data, that it really depends on your familiarity with what you want to use for actual plotting.

pacman::p_load(dplyr, ggplot2, tidygeocoder, leaflet, sf, zipcodeR, tigris, tmap, RColorBrewer, here, readxl, RPostgres, DBI, keyring, spdep, gridExtra)

If tmap is being annoying, you may need to pacman::p_load(XML) first… It’s weird!

Some online books that are helpful:

https://r-spatial.org/book/

https://r.geocompx.org/

add note sp outdated

Up and running with base maps

If you are running analysis at some arbitrary geospatial level, you will typically aggregate that to that level and then join it to a sf object for mapping. tigris is super easy to use and a good way to get those base maps.

texas.counties <- 
  tigris::counties(cb=FALSE, state = "TX") 
Retrieving data for the year 2022

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |=================================================================     |  94%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
names(texas.counties)
 [1] "STATEFP"  "COUNTYFP" "COUNTYNS" "GEOID"    "NAME"     "NAMELSAD"
 [7] "LSAD"     "CLASSFP"  "MTFCC"    "CSAFP"    "CBSAFP"   "METDIVFP"
[13] "FUNCSTAT" "ALAND"    "AWATER"   "INTPTLAT" "INTPTLON" "geometry"

Notice the geometry column! GEOID is the FIPS code, which is just the other two FIPS codes put together.

class(texas.counties)
[1] "sf"         "data.frame"

We have a sf object. Which you can think of as a data.frame with additional attributes like a coordinate reference system (CRS). That means how the world is projected from 3 dimensions to 2. It matters and it doesn’t… Often times you will get weird errors about CRS… But as we can see, this is a NAD83 CRS:

st_crs(texas.counties)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

ggplot

ggplot may be the best bet if you are already familiar with the syntax. geom_sf is the key change that is different than what you might use.

ggplot(texas.counties) + 
  geom_sf(aes(fill=AWATER))

tmaps

tmaps is also good for plotting. You’ll see it makes some assumptions about breaks in continuous variables

tm_shape(texas.counties) + 
  tm_polygons("AWATER") 

leaflet

If you want to make interactive maps (if you end up making Shiny dashboards, you might use leaflet) leaflet is a good choice.

leaflet map code can get pretty complex, and might require a lot more finagling. We will get an error related to the CRS! But we will ignore it for now.

pal_fun <- colorQuantile("YlOrRd", NULL, n = 5)


leaflet(texas.counties) %>%
  addPolygons(
    stroke = FALSE,
    fillColor = ~pal_fun(AWATER),
    fillOpacity = 0.8,
    popup = texas.counties$NAMELSAD) %>%
  addTiles() 
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'

others

That said, there is also plotly for interactive maps and tamps has interactive as well. Plotly plays very nicely with ggplot.

Honestly, most of the times with the maps is spent trying to get them to look right. Usually data is distrusted so weirdly that you have to work with getting it to look right, whether that be quartiles, deciles, whatever.

Zip Codes

Zip codes are mailing routes maintained by the USPS, they are not locations and they are not administrative boundaries, such as counties. They can be thought of as lines made of points of all the addresses along that route.

ZCTA

ZCTA https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html are areas, created by the census, that often, but do not always line up with zip codes.

Let’s get a base map of ZCTA, just to see:

(the cb parameter is about how detailed the shape is. TRUE is a more “generalized file” while FALSE is the full data (a larger object in memory) – what is available for what year depends…)

# Get Texas ZCTA
zcta_tx <- tigris::zctas(cb=FALSE, year=2010, state = "TX") 
ZCTAs can take several minutes to download.  To cache the data and avoid re-downloading in future R sessions, set `options(tigris_use_cache = TRUE)`

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |=================================================================     |  94%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%

As you can see, it looks quite different than a county map. If there are no people in an area, then there’s no geometry in that area. And, there are quite a bit of them. Sometimes ZCTA is useful for creating custom service areas, often called catchment areas. There is a lot more complicated polygons, so it takes longer to map.

ggplot(zcta_tx) + 
  geom_sf(aes(fill=AWATER10))

zipcodeR

zipcodeR is a package that is sometimes useful. It isn’t very telling on the github about how it got all the data, but I’m pretty sure it just uses HUD data, which we have in the crosswalk schema on GP.

data.path <- here("data")
address.file <- here(data.path, "address_file.xlsx")

address.df <- read_excel(address.file, na = "NA") 
zips <- reverse_zipcode(address.df$zip)
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 75036
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 75072
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 75288
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 79143
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA

Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 2451
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 2109
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 2906
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 1776
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code 1545
Warning in reverse_zipcode(address.df$zip): No data found for ZIP code NA
names(zips) 
 [1] "zipcode"                 "zipcode_type"           
 [3] "major_city"              "post_office_city"       
 [5] "common_city_list"        "county"                 
 [7] "state"                   "lat"                    
 [9] "lng"                     "timezone"               
[11] "radius_in_miles"         "area_code_list"         
[13] "population"              "population_density"     
[15] "land_area_in_sqmi"       "water_area_in_sqmi"     
[17] "housing_units"           "occupied_housing_units" 
[19] "median_home_value"       "median_household_income"
[21] "bounds_west"             "bounds_east"            
[23] "bounds_north"            "bounds_south"           

You can use it to get zip codes, for example, if you wanted to validate what someone says is a list of zips from Texas

# Get Texas zip codes 
tx.zips <- search_state("TX")

# Check a zip code
"53333" %in% tx.zips$zipcode
[1] FALSE

Crosswalks

This count of MS ppl per year was pulled with state = TX from data warehouse enrollment yearly, and I have a feeling many of the zip codes will not be in Texas!

prev.file <- here(data.path, "ms-zip5.csv")
prev.df <- read.csv(prev.file)
prev.df$zip5 <- as.character(prev.df$zip5)
head(prev.df)
  year  zip5 denom cases
1 2020 76225  1659     9
2 2020 75234 14482    14
3 2016 75455 12897    18
4 2018 33905     2     0
5 2021 75224 19461    19
6 2019 47909     1     0

Let’s see how many are in Texas according to zipcodeR

prev.df %>%
  select(zip5) %>% distinct() %>%
  mutate(in_tx = case_when(zip5  %in% tx.zips$zipcode ~ 1, TRUE ~ 0)) %>%
  summarise(n = n(), in_texas = sum(in_tx)) 
     n in_texas
1 5988     2481

HUD Crosswalks

We keep several HUD USPS Zip crosswalks on the greenplum server. They create these from the secret USPS files that have every address as a point location. What is special about them is that they have the ratios to account for a zip that exists in multiples of the other geographic area you are crosswalking to…

tac <- dbConnect(RPostgres::Postgres(),
                 dbname = "uthealth",
                 user = "jwozny",
                 password = key_get("Greenplum", "jwozny"),
                 host = "greenplum01.corral.tacc.utexas.edu")

Let’s get the HUD zip codes just to see how many from the enrollment table (when it was theoretically limited to TX) are actually in TX.

TX <- "TX"

tx.zips.hud <- 
dbGetQuery(tac,
  glue::glue_sql("select distinct zip 
                    from crosswalk.zip_county 
                   where usps_zip_pref_state = {TX};", .con = tac))

prev.df %>%
  select(zip5) %>% distinct() %>%
  mutate(in_tx = case_when(zip5  %in% tx.zips.hud$zip ~ 1, TRUE ~ 0)) %>%
  summarise(n = n(), in_texas = sum(in_tx)) 
     n in_texas
1 5988     2414

So, a few less than is in the zipcodeR list, but HUD knows there are a few less zip codes. For example, some PO boxes. If you really wanted to make sure you got EVERY single zip code, you might need to iterate and use different sources for crosswalking, which may not be worth it.

zip_county.crosswalk <- 
dbGetQuery(tac,
  glue::glue_sql("select *
                    from crosswalk.zip_county 
                   where usps_zip_pref_state = {TX}
                     and year >= 2016 ;", .con = tac)) 

zip_county.crosswalk <- zip_county.crosswalk %>% arrange(zip, year)

There are several ratio variables. These mean the percentage of the addresses for that ZIP you could find in the crosswalked area. res_ratio is for residences, bus for businesses, and so on. If the ratios are 1.0, that means that zip code only maps to one county. So you could assign a person to a county without worry. Or assign an observation to a county.

zip_county.crosswalk %>% head(20) %>% flextable::flextable()

zip

county_fips

res_ratio

bus_ratio

oth_ratio

tot_ratio

filename

year

usps_zip_pref_city

usps_zip_pref_state

last_update

73301

48,453

0

1

0

1

ZIP_COUNTY_122016.xlsx

2,016

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122017.xlsx

2,017

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122018.xlsx

2,018

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122019.xlsx

2,019

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122020.xlsx

2,020

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122021.xlsx

2,021

AUSTIN

TX

2023-04-05

73301

48,453

0

1

0

1

ZIP_COUNTY_122021.xlsx

2,022

AUSTIN

TX

2024-04-10

73960

48,421

1

1

0

1

ZIP_COUNTY_122016.xlsx

2,016

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122017.xlsx

2,017

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122018.xlsx

2,018

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122019.xlsx

2,019

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122020.xlsx

2,020

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122021.xlsx

2,021

TEXHOMA

TX

2023-04-05

73960

48,421

1

1

0

1

ZIP_COUNTY_122021.xlsx

2,022

TEXHOMA

TX

2024-04-10

75001

48,113

1

1

1

1

ZIP_COUNTY_122016.xlsx

2,016

ADDISON

TX

2023-04-05

75001

48,113

1

1

1

1

ZIP_COUNTY_122017.xlsx

2,017

ADDISON

TX

2023-04-05

75001

48,113

1

1

1

1

ZIP_COUNTY_122018.xlsx

2,018

ADDISON

TX

2023-04-05

75001

48,113

1

1

1

1

ZIP_COUNTY_122019.xlsx

2,019

ADDISON

TX

2023-04-05

75001

48,113

1

1

1

1

ZIP_COUNTY_122020.xlsx

2,020

ADDISON

TX

2023-04-05

75001

48,113

1

1

1

1

ZIP_COUNTY_122021.xlsx

2,021

ADDISON

TX

2023-04-05

BUT! They are not all like that. Sometimes zip codes are split across multiple counties.

zip_county.crosswalk %>% filter(zip == "75080") %>% head() %>% flextable::flextable()

zip

county_fips

res_ratio

bus_ratio

oth_ratio

tot_ratio

filename

year

usps_zip_pref_city

usps_zip_pref_state

last_update

75080

48,085

0.2577461

0.08920354

0.2301829

0.2362139

ZIP_COUNTY_122016.xlsx

2,016

RICHARDSON

TX

2023-04-05

75080

48,113

0.7422539

0.91079646

0.7698171

0.7637861

ZIP_COUNTY_122016.xlsx

2,016

RICHARDSON

TX

2023-04-05

75080

48,113

0.7502586

0.91030133

0.7706910

0.7700196

ZIP_COUNTY_122017.xlsx

2,017

RICHARDSON

TX

2023-04-05

75080

48,085

0.2497414

0.08969867

0.2293090

0.2299804

ZIP_COUNTY_122017.xlsx

2,017

RICHARDSON

TX

2023-04-05

75080

48,113

0.7081089

0.90565383

0.7654135

0.7341257

ZIP_COUNTY_122018.xlsx

2,018

RICHARDSON

TX

2023-04-05

75080

48,085

0.2918911

0.09434617

0.2345865

0.2658743

ZIP_COUNTY_122018.xlsx

2,018

RICHARDSON

TX

2023-04-05

zip_county.crosswalk %>% filter(zip == "75449") %>% arrange(zip, year) %>% head(10)%>% flextable::flextable()

zip

county_fips

res_ratio

bus_ratio

oth_ratio

tot_ratio

filename

year

usps_zip_pref_city

usps_zip_pref_state

last_update

75449

48,147

0.854033291

0.96551724

1

0.864485981

ZIP_COUNTY_122016.xlsx

2,016

LADONIA

TX

2023-04-05

75449

48,231

0.145966709

0.03448276

0

0.135514019

ZIP_COUNTY_122016.xlsx

2,016

LADONIA

TX

2023-04-05

75449

48,147

0.853566959

0.96491228

1

0.863375431

ZIP_COUNTY_122017.xlsx

2,017

LADONIA

TX

2023-04-05

75449

48,231

0.146433041

0.03508772

0

0.136624569

ZIP_COUNTY_122017.xlsx

2,017

LADONIA

TX

2023-04-05

75449

48,147

0.850122850

0.96491228

1

0.859887006

ZIP_COUNTY_122018.xlsx

2,018

LADONIA

TX

2023-04-05

75449

48,231

0.146191646

0.03508772

0

0.136723164

ZIP_COUNTY_122018.xlsx

2,018

LADONIA

TX

2023-04-05

75449

48,119

0.003685504

0.00000000

0

0.003389831

ZIP_COUNTY_122018.xlsx

2,018

LADONIA

TX

2023-04-05

75449

48,231

0.153471376

0.03448276

0

0.143337066

ZIP_COUNTY_122019.xlsx

2,019

LADONIA

TX

2023-04-05

75449

48,147

0.846528624

0.96551724

1

0.856662934

ZIP_COUNTY_122019.xlsx

2,019

LADONIA

TX

2023-04-05

75449

48,147

0.840718563

0.96491228

1

0.850993377

ZIP_COUNTY_122020.xlsx

2,020

LADONIA

TX

2023-04-05

Full Information Zip -> County Join

So, if you are interested in an unbiased estimate by year and for each county, you would want to go the “full distance” in using the HUD table ratio variables as weights to estimate county totals. You join and multiply and then collapse on county to get your info.

You have to be careful with sum if there are NA values. IDK why the NA values are there for ratio sometimes, but I didn’t filter them on getting the OG table. So, if you are getting lots of counties that seem to be missing data after using _ratio variables to create a weighted count from a crosswalk, double check the NA situation.

prev.crosswalk <- 
prev.df %>%
  inner_join(., zip_county.crosswalk, 
             by = c("zip5"="zip","year"="year")) %>%
  mutate(county_cases = cases * tot_ratio, 
         county_denom = denom * tot_ratio)

prev.county <- 
prev.crosswalk %>%
  group_by(county_fips) %>%
  summarise(denom = sum(county_denom, na.rm = TRUE), 
            cases = sum(county_cases, na.rm = TRUE)) %>%
  mutate(county_fips = as.character(county_fips))

Because of occasional sparisity issues, which we’ll talk about later, it is usually safer to left join from the geospatial sf object to make sure you don’t miss any counties.

prev.county.sf <- 
texas.counties %>%
  left_join(x=., y=prev.county, by=c("GEOID"="county_fips")) %>%
  mutate(prev = cases / denom * 100000)

county.plot.1 <- 
ggplot(prev.county.sf) + 
  geom_sf(aes(fill=prev)) + 
  theme_minimal() +
    theme(legend.key.height = unit(2, "cm"),
          legend.text = element_text(size = 10),
          legend.title = element_text(size = 12, face = "bold"),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          axis.line = element_blank(),
          plot.title = element_text(hjust = 0.5, size = 18, face="bold"))

county.plot.1

Not Full Information

There is a table in reference tables that has basically taken the geographies with the highest ratios for each zip code and put them into one table. So if we want to map a zip to just one county, we would use this table. Of course, we lose information.

one.to.one.crosswalk <- 
dbGetQuery(tac,
  glue::glue_sql("select *
                    from reference_tables.ref_zip_code 
                   where state = {TX};", .con = tac)) 

one.to.one.crosswalk <- 
one.to.one.crosswalk %>%
  mutate(county_fips = as.character(county_fips))

one.to.one.crosswalk %>% head() %>% flextable::flextable()

zip

county_fips

county_name

zcta

city

state

cbsa

last_update

79712

48329

Midland County

79705

MIDLAND

TX

33,260

2023-06-27

79331

48115

Dawson County

79331

LAMESA

TX

29,500

2023-06-27

76820

48319

Mason County

76820

ART

TX

99,999

2023-06-27

76693

48161

Freestone County

76693

WORTHAM

TX

99,999

2023-06-27

76883

48267

Kimble County

76849

TELEGRAPH

TX

99,999

2023-06-27

77475

48089

Colorado County

77475

SHERIDAN

TX

99,999

2023-06-27

So we’ll join it the same as before. There’s no ratio allocation like before, we’re just assuming all of the observations in one 5 digit zip code will apply to one county.

prev.county.2 <- 
prev.df %>%
  inner_join(., one.to.one.crosswalk, by = c("zip5"="zip")) %>%
  group_by(county_fips) %>%
  summarise(denom = sum(denom), cases = sum(cases)) %>%
  mutate(county_fips = as.character(county_fips))


prev.county2.sf <- 
texas.counties %>%
  left_join(x=., y=prev.county.2, by=c("GEOID"="county_fips")) %>%
  mutate(prev = cases / denom * 100000)

county.plot.2 <- 
ggplot(prev.county2.sf) + 
  geom_sf(aes(fill=prev)) + 
  theme_minimal() +
    theme(legend.key.height = unit(2, "cm"),
          legend.text = element_text(size = 10),
          legend.title = element_text(size = 12, face = "bold"),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          axis.line = element_blank(),
          plot.title = element_text(hjust = 0.5, size = 18, face="bold"))

county.plot.2

Full versus partial

Let’s see how different they are:

grid.arrange(county.plot.1, county.plot.2,
             ncol=2, nrow=1)

full <-
prev.county2.sf %>% as.data.frame() %>% select(GEOID, prev)

one.to.one <- 
prev.county.sf %>% as.data.frame() %>% select(GEOID, prev)

compare <- 
left_join(x=full, y=one.to.one, by="GEOID")

plot(compare$prev.x, compare$prev.y)

I assume that the more granular you get–if you are then stratifying by many variables–the more some bias would become apparent, but for a range of years, it doesn’t seem to affect it much. It depends on the question. The ratios don’t appear to change a whole over time for the allocation weights, so you could also just take the most recent year from the original crosswalk table and use that.

Public Health Regions

Public health regions are groupings of counties. Texas has lots of counties and sometimes data can be very sparse on the county level, so we need some higher level of aggregation, which is often public health regions. You can get what table from GP.

ph.region.crosswalk <- dbGetQuery(tac, "select fips_code as county_fips, public_health_region from reference_tables.ref_tx_county_regions")

You just union geometries as you would a sum() in summarize()

ph.region.crosswalk <- 
ph.region.crosswalk %>%
  mutate(county_fips = as.character(county_fips))

prev.region.sf <- 
prev.county.sf %>%
  left_join(x=., y=ph.region.crosswalk, by=c("GEOID"="county_fips")) %>%
  group_by(public_health_region) %>%
  summarise(denom = sum(denom, na.rm = TRUE), 
            cases = sum(cases, na.rm = TRUE), 
            geometry = st_union(geometry)) %>% 
  mutate(prev = (cases / denom) * 100000)

ggplot(prev.region.sf) + 
  geom_sf(aes(fill=prev)) + 
  theme_minimal() +
    theme(legend.key.height = unit(2, "cm"),
          legend.text = element_text(size = 10),
          legend.title = element_text(size = 12, face = "bold"),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          axis.line = element_blank(),
          plot.title = element_text(hjust = 0.5, size = 18, face="bold"))

A Note About Sparsity

Because Texas has so many counties, and there are many counties where very few people live, we often run into sparsity issues, where we have low counts. And you have the issue that variance of estimates is tied to the population size:

plot(log(prev.county.sf$denom), prev.county.sf$prev)

Since we’re just summing all years for all groups over counties, we haven’t reallllly run into the problem, but as soon as you get into age groups and by year, you will run into the problem of having lots of empty cells in your geometry. If you’re trying to show a spatial distribution and find pockets, you can end up with a messy looking map. You may need to look into smoothing, such as empirical bayes smoothers or building hierarchical models using public health region or something like that. In that case, you’re basically modeling it and using the predicted rate as your mapped value. All these methods are going to shift high variance areas toward either a global or local mean and leave low variance areas as they are.

We often censor areas if there are <=10 observations there, which can leave a lot of blanks. Giving the estimated rate can be a good workaround for that.

Points in Polygons

Geocoding

Sometimes you have address data and you want to get a lat/longitude. Ideally, we would just have an address on everyone and could geocode them and count the points within an area… but alas…. We will use tidygeocoder. It has several “modes”, some of which are faster than others… It is definitely not a mass solution if you need to geocode 100,000 addresses, but in a pinch it will do. I think the limit may actually be 10,000 for the “census” mode, and the other modes would be too slow anyways.

This is data that was used for opioid providers, so I’m just repurposing it for demonstration purposes. It has been somewhat cleaned beforehand standardize the addresses.

# Make full address field
# Only need this if using ARCGIS service
address.df <- 
address.df %>%
  mutate(full_address = glue::glue("{street}, {city}, {state} {zip}", .na=""))

# Send to geocoder
geocoder.census <- 
  address.df %>% 
  geocode(street = street, city = city, state = state, postalcode = zip,
          method = "census",
          full_results = TRUE)
Passing 2,292 addresses to the US Census batch geocoder
Query completed in: 57.1 seconds
# Keep those matched 
geocoded.census <- 
geocoder.census %>%
  filter(!is.na(lat))

So now you have to convert it to an SF object. Here’s where the CRS matters

geocoded.census.sf <- 
  st_as_sf(geocoded.census, coords = c("long", "lat"), crs = 4269)

Now they have an actual geometry

ggplot(geocoded.census.sf) +
  geom_sf(aes())

Well, look there’s some out of texas and we want to know what county they are in, so we need to do some work.

geocoded.census.sf.tx <- 
  st_join(geocoded.census.sf, texas.counties, join = st_within) %>% 
  filter(!is.na(GEOID))

ggplot(geocoded.census.sf.tx) +
  geom_sf(aes())

We can layer things on the same map in ggplot

ggplot(texas.counties) +
  geom_sf() + 
  geom_sf(data = geocoded.census.sf.tx)

If you wanted to count within you might have to remake the join.

# You need to specify left joins in the function call for st_join
providers.in.counties <- 
st_join(texas.counties, geocoded.census.sf, left=TRUE, join= st_contains)

# Count the clinics 
aggregate.clinics.in.counties <- 
providers.in.counties %>%
  mutate(clinic = if_else(is.na(name),0,1)) %>%
  group_by(GEOID) %>%
  summarise(
    clinic_count = sum(clinic),
    geometry = st_union(geometry)
  )

ggplot(aggregate.clinics.in.counties) +
  geom_sf(aes(fill=clinic_count))

Storing GIS in Database

You can store geospatial data in Postgres using PostGis and if you wanted to save the results of the geospatial analysis in a table that retains the geometry, you can do that, it just takes a bit of wrangling.

county.gp <- dbGetQuery(tac,"select * from gis.county;")
class(county.gp)
[1] "data.frame"
county.gp$geom <- st_as_sfc(county.gp$geom) 
county.gp.sf <- st_as_sf(county.gp)
class(county.gp.sf)
[1] "sf"         "data.frame"
ggplot(county.gp.sf) +
  geom_sf(aes(fill=shape_area))